Cache CLI extractor paths across Actions steps#3950
Conversation
henrymercer
left a comment
There was a problem hiding this comment.
Caching these invocations makes a lot of sense! I have a high level comment and a couple of lower level comments.
The main point is that now that we're caching multiple invocations, it might be a good opportunity to generalise the design. For instance, you could imagine something like:
const versionCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_VERSION_INFO, validate: isVersionInfo });
const resolveLanguagesCache = createPersistedCliCache({ envVar: EnvVar.CODEQL_RESOLVE_LANGUAGES, validate: isResolveLanguagesOutput });where createPersistedCliCache handles memoising in the Action and persisting between Actions steps with an environment variable.
Some smaller things:
- Ideally the cache entry would also depend on
getExtraOptionsFromEnv(["resolve","languages"]) - We should remove the cache in
testing-utils.tslike we do for the CodeQL version cache
mbg
left a comment
There was a problem hiding this comment.
I agree with @henrymercer's comments regarding a more generalised design for this. I am wondering about the use of environment variables here vs using a file on disk. I don't know if you have already considered this, but we store e.g. the Action configuration on disk as a file. Perhaps that would make sense for these cached CLI results as well.
A general point: could we also make sure to add doc comments for new top-level definitions before merging?
Repeated calls to `resolveLanguages()` will only pay the performance penalty of executing `codeql resolve languages` once.
By wrapping `resolveLanguages()`, which is memoized, we can avoid executing `codeql resolve extractor` several times over the course of an analysis.
This commit adds a `number` validator`, an `object` validator, an `isNumber` predicate, and `undefinable()` to test optional-but-not-null properties.
This provides a separation of concerns between the memoization and the execution.
c218fd6 to
b18df17
Compare
|
I've taken your comments into consideration and overhauled the design to be more comprehensive and unified. The design now backs to a temporary file instead of the environment. I also identified a few opportunities to refactor some duplicated code into helper functions. I kept the use of |
There was a problem hiding this comment.
Warning
- Copilot's review of this pull request may be incomplete because some of the changed files are excluded by your Copilot content exclusion settings. See Excluding content from Copilot for details.
Pull request overview
This PR introduces a cross-step cache for selected CodeQL CLI command outputs (notably codeql version and codeql resolve languages) to reduce repeated JVM startups and improve performance across GitHub Actions steps. It also refactors extractor resolution to derive extractor roots from resolve languages (reusing the cached output) and extends the internal JSON validation helpers to support stronger runtime validation of CLI JSON output.
Changes:
- Add a new 2-tier command-output cache (in-memory + temp-file) and wire it into
codeql.tsforversionandresolve languages. - Refactor
resolveExtractor()to useresolveLanguages()rather than invokingcodeql resolve extractor. - Extend
src/jsonvalidation helpers (number/object validators andundefinable) and add unit tests; remove now-obsoleteutil-based version cache.
Show a summary per file
| File | Description |
|---|---|
| src/util.ts | Removes the prior in-process/env-var version cache helpers. |
| src/util.test.ts | Removes tests for the old version-caching behavior. |
| src/testing-utils.ts | Updates test setup to reset the new command-output cache between tests. |
| src/status-report.ts | Switches telemetry version lookup to the new cache + isVersionInfo guard. |
| src/json/index.ts | Adds number, object, and undefinable validators to support schema checks. |
| src/json/index.test.ts | Adds tests for undefinable semantics (rejecting null). |
| src/environment.ts | Removes the env var used for the old persisted version cache. |
| src/codeql.ts | Adds caching wrappers/type guards and refactors extractor resolution and JSON parsing. |
| src/cache.ts | New: implements the command-output cache (memo + temp file). |
| src/cache.test.ts | New: tests cache persistence/memo behavior and validation. |
| lib/entry-points.js | Generated output (content excluded by policy; not reviewed). |
Copilot's findings
Files excluded by content exclusion policy (1)
- lib/entry-points.js
- Files reviewed: 10/11 changed files
- Comments generated: 3
| // Tier 1: the in-memory variable. | ||
| const memoized = inMemoryCache.get(key); | ||
| if (memoized !== undefined) { | ||
| return memoized.output as T; | ||
| } |
| return getCachedOrRun( | ||
| CommandCacheKey.ResolveLanguages, | ||
| cmd, | ||
| () => | ||
| runCliJson<ResolveLanguagesOutput>(cmd, [ |
Similar to #3943, this PR caches the output of
codeql resolve languages, which contains the paths to the various extractors so that repeated calls toresolveLanguages()are idempotent. Additionally, re-implementresolveExtractor()as a wrapper overresolveLanguages()(to re-use the cached output) rather than shell out tocodeql resolve extractor.In one experiment, I counted seven instances of shelling out to
codeql resolve extractor. When you dig into the code, you can see why:resolveExtractor()is not called often or from many places; But one caller isisTracedLanguage(), which is wrapped byisScannedLanguage(). And these functions are often used in a loop/map over all/some languages. This can explain why we see consecutive executions ofcodeql resolve extractor.In support of the above goals, this PR also adds some additional functions to the
jsonmodule, to enable validation of thecodeql versionoutput.Risk assessment
For internal use only. Please select the risk level of this change:
Which use cases does this change impact?
Workflow types:
dynamicworkflows (Default Setup, Code Quality, ...).Products:
analysis-kinds: code-scanning.analysis-kinds: code-quality.upload-sarifaction.Environments:
github.comand/or GitHub Enterprise Cloud with Data Residency.How did/will you validate this change?
.test.tsfiles).pr-checks).If something goes wrong after this change is released, what are the mitigation and rollback strategies?
How will you know if something goes wrong after this change is released?
Are there any special considerations for merging or releasing this change?
Merge / deployment checklist